A Unified Totally-Data-Driven Framework for Duration and Intonation Modeling
نویسندگان
چکیده
This paper proposes a unified framework for duration and intonation modeling in Mandarin TTS. In this framework, we design a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. By this representation, we can decompose F0 vector into 3 orthogonal polynomial parameters that are continuous scalars. Based on this vector-to-scalar decomposition, we can predict both duration and F0 representation parameters from linguistic and phonetic attributes by generalized linear models (GLM) in a unified manner. The model coefficients in GLM can be trained in a data-driven manner. Furthermore, the model structure, i.e., the significant attributes or attribute interactions in GLM, can be automatically optimized in a data-driven manner as well, rather than intuitively decided. So the proposed framework is totally-data-driven. In objective evaluation experiments, the new approach shows comparable or higher prediction performance compared with the other excellent approaches. Informal subjective perceptual experiments show that the predicted duration and intonation are quite appropriate and natural.
منابع مشابه
A Multi-Formalism Modeling Framework: Formal Definitions, Model Composition and Solution Strategies
In this paper, we present a multi-formalism modeling framework (abbreviated by MFMF) for modeling and simulation. The proposed framework is defined based on the concepts of meta-models and uses object-orientation to overcome the complexities and to enhance the extensibility. The framework can be used as a basis for modeling by various formalisms and to support model composition in a unified man...
متن کاملA Multi-Formalism Modeling Framework: Formal Definitions, Model Composition and Solution Strategies
In this paper, we present a multi-formalism modeling framework (abbreviated by MFMF) for modeling and simulation. The proposed framework is defined based on the concepts of meta-models and uses object-orientation to overcome the complexities and to enhance the extensibility. The framework can be used as a basis for modeling by various formalisms and to support model composition in a unified man...
متن کاملTotally data-driven intonation prediction model using a novel F0 contour parametric representation
This paper proposes a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. The polynomial is a simplified representation of Parallel Encoding and Target Approximation (PENTA) intonation model that includes a target component and an approximation component. We also propose predicting the polynomial parameters from linguistic and phonetic attributes...
متن کاملA proposal to quantitatively select the right intonation unit in data-driven intonation modeling
In this work, we provide a procedure for the systematic evaluation of the quantitative impact of the selection of the basic intonation unit for data-driven intonation modeling. Taking advantage of the corpus based modeling technique previously developed, we show how the number of prosodic features selected and the kind of basic unit determines the final prediction RMSE of the synthesized F0 pro...
متن کاملA Unified Framework for Delineation of Ambulatory Holter ECG Events via Analysis of a Multiple-Order Derivative Wavelet-Based Measure
In this study, a new long-duration holter electrocardiogram (ECG) major events detection-delineation algorithm is described which operates based on the false-alarm error bounded segmentation of a decision statistic with simple mathematical origin. To meet this end, first three-lead holter data is pre-processed by implementation of an appropriate bandpass finite-duration impulse response (FIR) f...
متن کامل